`
Google’s search engine) which endpoints to index and which to
ignore. it hinted that the robots.txt file may have more entries than
just these two, and advised us to inspect it manually.
Lastly, it also identified another endpoint at /wp-login.php, which
is a login page for WordPress, a known blog platform. Navigate to
the main page at http://172.16.10.12/ to confirm you’ve identified a
blog.
Exercise 6: Automatically Exploring Non-Indexed Endpoints
Nikto advised us to manually explore the robots.txt file at
http://172.16.10.12/robots.txt to identify non-indexed endpoints.
Finding these endpoints is useful during a penetration test because
we can add them to our list of possible targets to test. If you open
this file, you should notice a list of paths:
User-agent: *
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
--snip--
Disallow: /manual
Disallow: /manual/*
Disallow: /phpmanual/
Disallow: /category/
Disallow: /donate.php
Disallow: /amount_to_donate.txt
We identified some of these endpoints earlier (such as
/donate.php and /wp-admin), but others we didn’t see when scanning
with Nikto.
Now that we’ve found these endpoints, we can use bash to see
whether they really exist on the server. Let’s put together a script
that will perform the following activities: make an HTTP request to
robots.txt, return the response and iterate over each line, parse the
output to extract only the paths, make an additional HTTP request to
each path separately, and check what status code each path returns to
find out if it exists.
Listing 5-1 is an example script that can help do this work. It
relies on a useful cURL feature you’ll find handy in your bash
scripts: built-in variables you can use when you need to make HTTP
requests, such as the size of the request sent
Black Hat Bash (Early Access) © 2023 by Dolev Farhi and Nick Aleks